Rank | Count | Beginning |
---|---|---|
19077 | 5585 | Bu |
14260 | 1395 | Bir |
3305 | 1188 | Ama |
69044 | 891 | O |
4969 | 783 | Ancak |
31614 | 760 | Çünkü |
49387 | 740 | Her |
16452 | 687 | Biz |
24544 | 546 | Bunun |
12572 | 505 | Ben |
32483 | 499 | Daha |
93044 | 478 | Ve |
94536 | 414 | Yani |
89257 | 410 | Türkiye |
67712 | 395 | Ne |
55061 | 393 | İşte |
30690 | 387 | Çok |
81169 | 382 | Şimdi |
82790 | 381 | Son |
8787 | 371 | Ayrıca |
24313 | 348 | Bunu |
37346 | 339 | Eğer |
96370 | 338 | Yeni |
20945 | 336 | Bugün |
38833 | 333 | En |
74147 | 331 | Özellikle |
41225 | 327 | Fakat |
52627 | 323 | İlk |
84513 | 304 | Şu |
17111 | 302 | Bizim |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV